1 Introduction

Given the rapid evolution of CosMx SMI data analysis tools, the guidance in this vignette may differ from other published user documents accompanying the platform. Please speak to your NanoString Applications Scientist to discuss your particular project needs.

1.1 Export Module

The export module can be obtained from a NanoString Applications Scientist or from the website. It is loaded into the CosMx SMI Data Analysis Suite in AtoMx SIP. For user documentation related to these platforms, please refer to the Data Export Guide or the User Manual.

The export module can be used at any point in a pipeline on AtoMx. All results up to the point of export will be in the Seurat object and TileDB array. These are equivalent outputs in different formats. We show below how to interact with both objects: TileDB, Seurat.

This format is the same for RNA and Protein studies but the format of the raw files folder will be different depending on the analyte.

It is recommended to only export Raw Files once per study. If multiple pipelines from the same study export Raw File there could be TBs of duplicated files. For this study, the zipped raw files are over 800 GB. These files are not zipped automatically in the export module. An explanation of all important raw files can be found here.

Due to the file size of these data, the export module gives the choice of what to export: TileDB array, Seurat object, and/or Raw Files.

After exporting, depending on parameters selected for export, the data folder can contain:

  • Full Seurat Object containing all previous analyses in AtoMx SIP (if FullSeuratObject == TRUE)
  • Full TileDB array containing all previous analyses in AtoMx SIP
  • If Raw Files exported, all flowcell folders.
    • This study has 2 slides named: SWTA20230803015150 and 14AUG23_Val12_PTL6316_Exp8_1kplex37CPA_SI_S2

1.2 Liver Dataset

We will continue this vignette with data from the Liver Public Data Release.

Two tissue sections from 5µm Formalin-fixed Paraffin Embedded (FFPE) sections:

  • Normal liver, 35yo Male, RINe: 1.4, DV200 58%
  • Hepatocellular Carcinoma (Liver Cancer), stage II, Grade G3, 65yo Female, RINe 2.3, DV200 68%

Scan areas:

  • Normal Liver: 76mm2
  • Cancer Liver: 100mm2

Fields of View (FOV)

  • Normal Liver: 301
  • Cancer Liver: 383

Panel Used: CosMx Human Universal Cell Characterization Panel (1000-plex)

Segmentation markers:

  • Normal Liver: DAPI, CD298/B2M, Ck8/18, PanCK, CD45
  • Cancer Liver: DAPI, CD298/B2M, CD68, PanCK, CD45


2 Read In Export Data

library(ggplot2)
library(ggpubr)
library(ggrepel)
library(gridExtra)
library(matrixStats)
library(patchwork)
library(pheatmap)
library(Seurat)
library(RColorBrewer)
library(reshape2)

2.1 TileDB Array data

“TileDB is a powerful engine for storing and accessing dense and sparse multi-dimensional arrays, which can help you model any complex data efficiently.” TileDB GitHub

In addition to formats which store data in memory, the AtoMx SIP exports support TileDB formats which access and load data as memory is needed. This format, while less commonly used for single-cell projects, allows for scalability to very high-density analysis. As many CosMx studies will be well in excess of 1 million cells, this new format will enable robust and scalable computations across very large studies without requiring all data to be loaded into memory. CosMx SMI datasets can be large enough to quickly run out of RAM when doing analyses if the entire dataset is read into memory. By saving data in TileDB arrays we don’t need to have all of the data in memory, only the specific parts we are actively working on.

TileDBsc is an R implementation of the Stack of Matrices, Annotated (SOMA) API based on TileDB. Using the tiledbsc R package, one can access the data stored as TileDB arrays in the SOMA data model from our analysis pipeline. The TileDB organization, in association with the Chan Zuckerberg Initiative, is currently creating a supplemental library that will enable users to convert the two most popular single cell format, Seurat and AnnData, to and from SOMA to make data analysis in multiple languages like R and python simple. While all of the data are stored in TileDB arrays, it can be saved into a Seurat object to work in a more familiar format and to be able to use existing Seurat functions, see the Seurat Object section below for more information.


For this section of the tutorial, we will be focusing on two packages: TileDB-R and tiledbsc. TileDB-R allows us to use the low-level functions from TileDB like reading and writing matrices and tiledbsc is built on top of this framework specifically for single cell data that allows easy conversion to a Seurat object. f

if (!require("tiledb", quietly = TRUE))
  remotes::install_github("TileDB-Inc/TileDB-R", force = TRUE, 
                          ref = "0.17.0")

if (!require("tiledbsc", quietly = TRUE))
  remotes::install_github("tiledb-inc/tiledbsc", force = TRUE, 
                          ref = "8157b7d54398b1f957832f37fff0b173d355530e")

library(tiledb)
library(tiledbsc)

These packages are still under development but these are the stable versions used in the AtoMx pipeline.

tiledb: 0.17.0
tiledbsc: 0.1.4.9001

Set the tiledbURI variable below to point to the file path of the TileDB folder. For the example below, the TileDB folder is located in LiverDataRelease_TileDBArray folder. You can also set this to an Amazon Web Serivices (AWS) S3 bucket if your TileDB object is stored in an S3 bucket.

# File path to TileDB array on local machine
tiledbURI <- "LiverDataRelease_TileDBArray/"

If your data are in an Amazon S3 bucket (which is currently required for export), we need to set up the environment for TileDB to be able to interact with AWS. Depending on your AWS setup you might need to add your access key, secret key, and session token here as well. Specific instructions on setting up the AWS account can be found here.

# AWS region of the URI
s3Region <- "us-west-2"
aws_creds <- rjson::fromJSON( paste(system2("aws", 
                                             args = c("configure", 
                                                      "export-credentials"), 
                                             stdout = TRUE), collapse = "")) 
# configure tiledb S3
cfg <- tiledb_config() 
cfg["vfs.s3.region"] <- s3Region
cfg["vfs.s3.aws_access_key_id"] <- aws_creds$AccessKeyId
cfg["vfs.s3.aws_secret_access_key"] <- aws_creds$SecretAccessKey
cfg["vfs.s3.aws_session_token"] <- aws_creds$SessionToken
ctx <- tiledb_ctx(cfg)

The main object in TileDB is a Stack of Matrices, Annotated (SOMA). SOMA is an open data model for representing annotated matrices, like those commonly used for single-cell data analysis. A SOMACollection contains multiple SOMAs. In CosMx SMI data, there is a SOMA for targets RNA species, negative control probes, and FalseCodes.

For protein datasets, count data are currently stored in RNA SOMA. We are currently investigating creating a separate SOMA specifically for protein and may come in a latter release.

An example of the basic TileDB data structure can be seen below:

directory
|-- __group
|   |-- __data_specific_alphanumeric_name_here
|   |-- __data_specific_alphanumeric_name_here
|-- __meta
|   |-- __data_specific_alphanumeric_name_here
|-- __tiledb_group.tdb
|-- soma_RNA
|   |-- X
|   |-- __group
|   |-- __meta
|   |-- __tiledb_group.tdb
|   |-- obs
|   |-- obsm
|   |-- obsp
|   |-- uns
|   |-- var
|   |-- varm
|   |-- varp
|-- uns
    |-- __group
    |-- __meta
    |-- __tiledb_group.tdb
    |-- commands

These files are not touched except through TileDB accessor functions, shown below.

2.1.1 Reading into R

You can read the arrays stored in the TileDB object using the tiledbsc R package which offers accesor functions to easily read the data into memory or access only.

# read in SOMACollection
tiledb_scdataset <- tiledbsc::SOMACollection$new(uri = tiledbURI, 
                                                 verbose = FALSE)

The TileDB object is a collection of pointers to the RNA counts, normalized RNA counts, negative control probes, and falsecode SOMAs. Each SOMA follows the AnnData shape.

For protein datasets, count data are currently stored in RNA SOMA. We are currently investigating creating a separate SOMA specifically for protein and may come in a later release.

AnnData Shape

names(tiledb_scdataset$somas)
## [1] "falsecode"      "negprobes"      "RNA"            "RNA_normalized"
names(tiledb_scdataset$somas$RNA$members)
## [1] "X"    "varp" "obs"  "varm" "obsp" "uns"  "obsm" "var"

Having an object of pointers allows for small memory usage. This TileDB array with an entire study only used 3.23 MB of memory even though it contains nearly 800,000 cells!

The main parts of CosMx SMI data are:

  1. Cell-by-target counts which are stored in the X slot
  2. Cell metadata which is stored in the obs slot
  3. Target transcript coordinates which is stored in the obsm slot
  4. Results from analysis modules which are stored in various slots and will be detailed below

These parts of the data can be loaded into memory separately or all together.

Individual TileDB objects might have different orders of cells. Please pay attention to these orders.

2.1.1.1 Cell-by-target counts

All matrices in TileDB are stored as sparse matrices. This structure is used for querying the data.

Count matrices currently have targets on rows and cells on columns like a standard Seurat object. In future TileDB releases, these will be transposed to look more like AnnData. This change will decrease load times in TileDB but any coercion to Seurat, shown below, will not be affected.

2.1.1.1.1 Raw Counts

The raw data cell-by-target expression data are stored in the X slot and can be retrieved and stored in memory using the below code.

# batch_mode parallelizes the readin, decreasing computation time
counts <- tiledb_scdataset$somas$RNA$X$members$counts$to_matrix(batch_mode = TRUE) 
dim(counts)
## [1]   1000 793318
counts[1:4,1:4]
## 4 x 4 sparse Matrix of class "dgTMatrix"
##       c_1_100_1000 c_1_100_1017 c_1_100_1028 c_1_100_1043
## AATK             1            1            2            1
## ABL1             .            .            .            .
## ABL2             .            .            .            .
## ACACB            .            .            .            .

Memory used: 2.4 GB

2.1.1.1.2 Normalized Counts

Use the code below to retrieve and save the normalized counts in memory.

These data were normalized using Pearson Residuals to account for library size factors to ensure that cell specific total transcript abundance and distribution of counts, which may vary some between FOVs and samples. See Normalization Section for more details.

In future releases normalized data will be located in tiledb_scdataset$somas$RNA$X$members$data and not in a separate SOMA.

norm <- tiledb_scdataset$somas$RNA_normalized$X$members$data$to_matrix(batch_mode = TRUE)
dim(norm)
## [1]   1000 793318
norm[1:4,1:4]
## 4 x 4 sparse Matrix of class "dgTMatrix"
##        c_1_100_1  c_1_100_10 c_1_100_100 c_1_100_1000
## AATK  -0.1429034 -0.06889529  -0.2645790    7.5780320
## ABL1  -0.1972264 -0.09509172  -0.3650734   -0.1790312
## ABL2  -0.1667603 -0.08039924  -0.3087212   -0.1513743
## ACACB -0.2951553 -0.14233400  -0.5460241   -0.2679370

Memory used: 12.75 GB

2.1.1.2 Cell Metadata

Cell-level metadata are stored in obs slot. It consists of cell information read from the output of CosMx SMI. Some worth noting are the centroid x-y locations, flow cell, fov and other cell stats from the Cell_Stats csv files from raw data inputs. The obs slot will also store results from modules run from the pipeline. Some examples are the QC module and clustering modules. The code below can be used to retrieve and to store the cell metadata from obs slot into memory.

metadata <- tiledb_scdataset$somas$RNA$obs$to_dataframe()
dim(metadata)
## [1] 793318     67
metadata[1:4,1:10]
##              orig.ident nCount_RNA nFeature_RNA nCount_negprobes
## c_1_100_1             c        142           77                0
## c_1_100_10            c         33           26                0
## c_1_100_100           c        487          143                0
## c_1_100_1000          c        117           69                0
##              nFeature_negprobes RNA_pca_cluster_default
## c_1_100_1                     0                      16
## c_1_100_10                    0                      13
## c_1_100_100                   0                       3
## c_1_100_1000                  0                      13
##              RNA_pca_cluster_default.1 nCount_falsecode nFeature_falsecode fov
## c_1_100_1                           20                0                  0 100
## c_1_100_10                          15                5                  5 100
## c_1_100_100                          7                1                  1 100
## c_1_100_1000                        16                2                  2 100

Memory used: 447.44 MB


If you do not want all the cell metadata to be read into memory, you can read specific columns by specifying the attrs parameter. We can use this metadata to visualize the tissue structure by specifically plotting where each cell is in physical coordinate space.

head(tiledb_scdataset$somas$RNA$obs$attrnames())
## [1] "orig.ident"              "nCount_RNA"             
## [3] "nFeature_RNA"            "nCount_negprobes"       
## [5] "nFeature_negprobes"      "RNA_pca_cluster_default"
cellCoords <- tiledb_scdataset$somas$RNA$obs$to_dataframe(
  attrs = c("x_FOV_px", "y_FOV_px", "x_slide_mm", "y_slide_mm", 
            "slide_ID_numeric", "Run_Tissue_name", "fov"))
head(cellCoords)
##              x_FOV_px y_FOV_px x_slide_mm y_slide_mm slide_ID_numeric
## c_1_100_1          42       36    8.70804    9.73368                1
## c_1_100_10       2737       25    9.03144    9.73500                1
## c_1_100_100      2888      409    9.04956    9.68892                1
## c_1_100_1000     3457     3742    9.11784    9.28896                1
## c_1_100_1001     1298     3763    8.85876    9.28644                1
## c_1_100_1002     1001     3738    8.82312    9.28944                1
##              Run_Tissue_name fov
## c_1_100_1        NormalLiver 100
## c_1_100_10       NormalLiver 100
## c_1_100_100      NormalLiver 100
## c_1_100_1000     NormalLiver 100
## c_1_100_1001     NormalLiver 100
## c_1_100_1002     NormalLiver 100
ggplot(cellCoords, aes(x=x_slide_mm, y=y_slide_mm))+
  geom_point(alpha = 0.05, size = 0.01)+
  facet_wrap(~Run_Tissue_name)+
  coord_equal()+
  labs(title = "Cell coordinates in XY space")

2.1.1.3 Target Transcript Coordinates

Transcript coordinates are x-y locations of individual transcript. Note that one cell contains many transcripts. These are currently stored in obsm slot. In future releases transcript coordinates will be stored in the uns slot: tiledb_scdataset$somas$RNA$uns$members$transcriptCoords

To read the entire transcript coordinate data frame into memory:

transcriptCoords <- tiledb::tiledb_array(
  tiledb_scdataset$somas$RNA$obsm$members$transcriptCoords$uri,
  return_as="data.frame")[]
head(transcriptCoords)

In many cases, having all of the transcript coordinates in memory is too large. TileDB allows for querying of the data to significantly decrease the amount of data read into memory. Here we only want the transcripts in FOV 15 to plot later.

fov <- 15

qc <- tiledb_query_condition_init(attr = "fov",
                                  value = fov,
                                  dtype = "INT32",
                                  op = "EQ")
transcriptCoords <- tiledb_array(tiledb_scdataset$somas$RNA$obsm$members$transcriptCoords$uri, 
                                 query_condition=qc, return_as="data.frame")[]
head(transcriptCoords)
##   slideID fov x_FOV_px y_FOV_px z_FOV_slice  target CellId     cell_id
## 1       1  15     1375        9           1   HMGN2   1271 c_1_15_1271
## 2       1  15       25       10           6  MALAT1      1    c_1_15_1
## 3       1  15       25       10           5    XBP1      1    c_1_15_1
## 4       1  15       36       10           3 SLCO2B1      1    c_1_15_1
## 5       1  15      167       10           6   RPL32     23   c_1_15_23
## 6       1  15      201       10           7   APOA1     23   c_1_15_23
##    CellComp
## 1   Nuclear
## 2   Nuclear
## 3   Nuclear
## 4   Nuclear
## 5 Cytoplasm
## 6 Cytoplasm

One can visualize this by plotting transcripts from an individual sample (flow cell 1, normal liver) and a specific region within the tissue (FOV 15). In the graph below the centroid of each cell is shown as a black point and each transcript is shown as a colored point.

slide <- 1
fov <- 15

slideName <- unique(cellCoords$Run_Tissue_name[cellCoords$slide_ID_numeric == 
                                                 slide])

fovCoords <- cellCoords[cellCoords$slide_ID_numeric == slide & 
                          cellCoords$fov == fov,]
fovTranscriptCoords <- transcriptCoords[transcriptCoords$slideID == slide & 
                                          transcriptCoords$fov == fov,]

targetCounts <- table(fovTranscriptCoords$target)

targets <- names(targetCounts[which(targetCounts >= 2500 & 
                                      targetCounts <= 3000)])
fovTranscriptCoords <- fovTranscriptCoords[fovTranscriptCoords$target %in% 
                                             targets,]

ggplot(fovCoords, aes(x=x_FOV_px, y=y_FOV_px))+
  geom_point(size = 0.5, color = "black")+
  geom_point(data = fovTranscriptCoords, 
             mapping = aes(x=x_FOV_px, 
                           y=y_FOV_px, 
                           color = target), 
             size = 0.6, alpha = 0.6)+
  theme_bw()+
  coord_equal()+
  guides(colour = guide_legend(override.aes = list(size=1,
                                                   alpha=1)))+
  labs(color = "RNA Target", title = paste0("RNA Transcripts in\n", 
                                            slideName, "\nFOV", fov))

2.1.1.4 Analysis Results

In addition to storing data about count information, the AtoMx SIP enables downstream analysis as well including methods such as differential expression and ligand-receptor analysis. Metadata from these analyses can also be included in the exported results from AtoMx. See Analysis Results Section for full description of each analysis module that can be run in AtoMx SIP, manual found here.

The results from analysis modules can be found in slots dependent on their output shape. They follow the AnnData object shape shown above. Many of the results names from an AtoMx export will have a globally unique identifier (GUID) attached to their name. GUIDs are used to ensure running multiple modules in the same pipeline don’t overwrite the results. Each module has it’s own GUID meaning that results from the same module run can be grouped.

slot type of data results
obs cell metadata
obsm data on cells requiring more than 1 column PathwayCellsAUC, dimreduction_pca, dimreduction_approximateumap, latest.fovs, RNA_nbclust_loglikes, RNA_normalized_cluster_normExpr, cellScores, transcriptCoords, RNA_normalized_markers_result, RNA_normalized_top_markers, dimreduction_approximateUMAP_bySlide
obsp paired result with cells on rows and columns graph_RNA_pca_snn, graph_RNA_pca_nn, spatial_distances, RNA_spatialDE_W
var target metadata
varm data on targets requiring more than 1 column dimreduction_pca
varp paired result with targets on rows and columns RNA_LeesLRes, RNA_nbclust_log_likelyhood_profiles
uns unstructured data, doesn’t have to follow cells or targets spatEnrichRes, cellTypeScores, cellTypePairs, linkScores, cellProximity

Transcript Coordinates are currently in obsm but will be moved to uns in a later release

Here is a small example on how to read in two results arrays. All results follow a similar read-in pattern, the only thing that needs to change is the slot and result matrix names:

tiledb_scdataset$somas$RNA$<put slot name here>$members$<put result matrix name here>$to_matrix()

To get the result for PCA, note from the table above, PCA is saved in the obsm slot and the name of the result matrix is dimreduction_pca:

tiledb_scdataset$somas$RNA$obsm$members$dimreduction_pca$to_matrix()

pca <- tiledb_scdataset$somas$RNA$obsm$members$dimreduction_pca$to_matrix()
pca[1:4,1:4]
##                   PCA_1       PCA_2       PCA_3      PCA_4
## c_1_100_1    -1.6417182 -2.33161620 -2.99827387 -0.1049817
## c_1_100_10   -1.1434024 -0.08528228 -1.04553850  0.3209358
## c_1_100_100  -7.0135208  0.52200934  0.09413956  0.1917221
## c_1_100_1000 -0.2238483 -2.22115908 -0.78119275  1.5374492
neighbs <- tiledb_scdataset$somas$RNA$obsp$members$graph_RNA_pca_nn$to_seurat_graph()
neighbs[1:4,1:25]
## 4 x 25 sparse Matrix of class "dgCMatrix"
##                                                               
## c_1_100_1    1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 . . . . .
## c_1_100_10   . . . . . . . . . . . . . . . . . . . . 1 1 1 1 1
## c_1_100_100  . . . . . . . . . . . . . . . . . . . . . . . . .
## c_1_100_1000 . . . . . . . . . . . . . . . . . . . . . . . . .

2.1.1.5 Read Data into Seurat

The TileDB objects can also be read into Seurat objects. The TileDBsc native to_seurat function does not read in data from slots like uns so some analysis results will not be attached to the Seurat object and need to be read in individually from the TileDB array like shown above.

# Seurat object with all SOMAs 
# seurat <- tiledb_scdataset$to_seurat(batch_mode = TRUE)
# seurat

RNA_seurat <- tiledb_scdataset$to_seurat(somas = c("RNA"), batch_mode = TRUE)
RNA_seurat
## An object of class Seurat 
## 1000 features across 793318 samples within 1 assay 
## Active assay: RNA (1000 features, 0 variable features)
##  3 dimensional reductions calculated: pca, approximateumap, approximateUMAP_bySlide

These Seurat objects can be used in Seurat functions for calculations or visualizations.

RNA_seurat@meta.data <- metadata
Idents(RNA_seurat) <- RNA_seurat$cellType

markers <- FindMarkers(RNA_seurat, ident.1 = unique(Idents(RNA_seurat))[1L],
                       logfc.threshold = 0.25, test.use = "roc",
                       only.pos = TRUE)

VlnPlot(RNA_seurat, features = head(rownames(markers), 2),
        log = TRUE, pt.size = 0)

While a Seurat object might be easier to use, please be aware that reading an entire dataset into memory can be RAM exhaustive. This Seurat object requires 3.19 GB of RAM vs the TileDB array which only uses 3.27 MB at loading.

2.1.2 Reading into Python

TileDB objects can also be read into python.

This section is in R and only used to set up python environment.

# install packages to use python in R
if (!require("rminiconda", quietly = TRUE))
  remotes::install_github("hafen/rminiconda")

library(rminiconda)
library(reticulate)

# create python environment
if(!rminiconda::is_miniconda_installed("tiledb_python")){
  rminiconda::install_miniconda(version = 3, name = "tiledb_python")
}

# use created python environment
py <- rminiconda::find_miniconda_python("tiledb_python")
reticulate::use_python(py, required = TRUE)

# install necessary python packages into environment
for(i in c("tiledb", "anndata", "tiledbsoma==0.1.19", 
           "scanpy", "gc-python-utils")){
  rminiconda::rminiconda_pip_install(pkg_name = i, name = "tiledb_python")
}

The rest of this section is in python.

import tiledb
import tiledbsoma
from anndata import AnnData
import scanpy as sc

tiledbURI = "LiverDataRelease_TileDBArray/"
s3Region = "us-west-2"

# set up s3 environment
config = tiledb.Config()
config.update({"vfs.s3.region" : s3Region})
# config.update({"vfs.s3.aws_access_key_id" : ""})
# config.update({"vfs.s3.aws_secret_access_key" : ""})
# config.update({"vfs.s3.aws_session_token" : ""})
ctx = tiledb.Ctx(config)

# read in SOMACollection
pySoma = tiledbsoma.SOMACollection(tiledbURI, ctx=ctx)
pySoma.keys()
## ['RNA_normalized', 'RNA', 'negprobes', 'falsecode', 'uns']

2.1.2.1 Cell-by-target counts

All matrices in TileDB are stored as sparse matrices. This structure is used for querying the data.

Count matrices currently have targets on rows and cells on columns like a standard Seurat object. In future TileDB releases, these will be transposed to look more like AnnData. This change will decrease load times in TileDB but any coercion to Seurat will not be affected.

2.1.2.1.1 Raw Counts
counts = pySoma['RNA'].X['counts'].csr()
counts
## <793318x1000 sparse matrix of type '<class 'numpy.float64'>'
##  with 146503383 stored elements in Compressed Sparse Row format>
2.1.2.1.2 Normalized Counts

These data were normalized using Pearson Residuals to account for library size factors to ensure that cell specific total transcript abundance and distribution of counts, which may vary some between FOVs and samples. See Normalization Section for more details.

In future releases normalized data will be located in pySoma['RNA'].X['data'] and not in a separate SOMA.

norm = pySoma['RNA_normalized'].X['data'].csr()
norm
## <793318x1000 sparse matrix of type '<class 'numpy.float64'>'
##  with 793317999 stored elements in Compressed Sparse Row format>

2.1.2.2 Cell Metadata

Cell-level metadata are stored in obs slot. It consists of cell information read from the output of CosMx SMI. Some worth noting are the x-y locations, flow cell, fov and other cell stats from the Cell_Stats csv files from raw data inputs. The obs slot will also store results from modules run from the pipeline. Some examples are the QC module and clustering modules. The code below can be used to retireve and to store the cell metadata from obs slot into memory.

obs = pySoma['RNA'].obs.df()
obs.head()
##              orig.ident  nCount_RNA  ...  i.median_RNA  i.RNA_quantile_0.9
## obs_id                               ...                                  
## c_1_100_1             c       142.0  ...         110.0               690.6
## c_1_100_10            c        33.0  ...         110.0               690.6
## c_1_100_100           c       487.0  ...         110.0               690.6
## c_1_100_1000          c       117.0  ...         110.0               690.6
## c_1_100_1001          c       320.0  ...         110.0               690.6
## 
## [5 rows x 67 columns]

2.1.2.3 Target Transcript Coordinates

Transcript coordinates are x-y locations of individual transcript. Note that one cell contains multiple transcripts. These are currently stored in obsm slot. In future releases transcript coordinates will be located here: pySoma['RNA'].uns["transcriptCoords"].uri

transcriptCoords = tiledb.open_dataframe(pySoma['RNA'].obsm["transcriptCoords"].uri, ctx=ctx)
transcriptCoords

In many cases, having all of the transcript coordinates in memory is too large. TileDB allows for querying of the data to significantly decrease the amount of data read into memory. Here we only want the transcripts in FOV 15.

with tiledb.open(pySoma['RNA'].obsm["transcriptCoords"].uri, mode="r", ctx=ctx) as A:
    q = A.query(cond="fov == 15")

    print(q.df[:,:])
##          slideID  fov  x_FOV_px  ...  CellId      cell_id   CellComp
## 0              1   15      1375  ...    1271  c_1_15_1271    Nuclear
## 1              1   15        25  ...       1     c_1_15_1    Nuclear
## 2              1   15        25  ...       1     c_1_15_1    Nuclear
## 3              1   15        36  ...       1     c_1_15_1    Nuclear
## 4              1   15       167  ...      23    c_1_15_23  Cytoplasm
## ...          ...  ...       ...  ...     ...          ...        ...
## 1409703        2   15      2226  ...     590   c_2_15_590   Membrane
## 1409704        2   15      2237  ...     590   c_2_15_590  Cytoplasm
## 1409705        2   15      2477  ...     278   c_2_15_278   Membrane
## 1409706        2   15      2497  ...     285   c_2_15_285  Cytoplasm
## 1409707        2   15      3143  ...     586   c_2_15_586  Cytoplasm
## 
## [1409708 rows x 9 columns]

2.1.2.4 Analysis Results

See R read in section for full list of analysis results and Analysis Results Section for full description of module that can be run in AtoMx SIP.

pcad = pySoma['RNA'].obsm['dimreduction_pca'].df()
pcad.head()
##                  PCA_1     PCA_2     PCA_3  ...    PCA_48    PCA_49    PCA_50
## obs_id                                      ...                              
## c_1_100_1    -1.641718 -2.331616 -2.998274  ...  0.208275 -0.535591 -0.390931
## c_1_100_10   -1.143402 -0.085282 -1.045539  ...  0.334928  0.035539  0.606723
## c_1_100_100  -7.013521  0.522009  0.094140  ...  1.244416  0.536143  1.016428
## c_1_100_1000 -0.223848 -2.221159 -0.781193  ...  2.616751 -0.784868 -0.295499
## c_1_100_1001 -3.051702 -4.481670  0.119539  ... -0.903006 -0.003687  1.911351
## 
## [5 rows x 50 columns]

These data can easily be coerced into AnnData format that can be plugged into squidpy or other python packages.

coordinates = obs[["x_slide_mm", "y_slide_mm"]]
adata = AnnData(norm, obs = obs, obsm={"spatial": coordinates}, dtype = "float32")
adata
## AnnData object with n_obs × n_vars = 793318 × 1000
##     obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'nCount_negprobes', 'nFeature_negprobes', 'RNA_pca_cluster_default', 'RNA_pca_cluster_default.1', 'nCount_falsecode', 'nFeature_falsecode', 'fov', 'Area', 'AspectRatio', 'Width', 'Height', 'Mean.PanCK', 'Max.PanCK', 'Mean.CK8.18', 'Max.CK8.18', 'Mean.Membrane', 'Max.Membrane', 'Mean.CD45', 'Max.CD45', 'Mean.DAPI', 'Max.DAPI', 'cell_id', 'assay_type', 'Run_name', 'slide_ID_numeric', 'Run_Tissue_name', 'Panel', 'Mean.Yellow', 'Max.Yellow', 'Mean.CD298_B2M', 'Max.CD298_B2M', 'cell_ID', 'x_FOV_px', 'y_FOV_px', 'x_slide_mm', 'y_slide_mm', 'propNegative', 'complexity', 'errorCtEstimate', 'percOfDataFromError', 'qcFlagsRNACounts', 'qcFlagsCellCounts', 'qcFlagsCellPropNeg', 'qcFlagsCellComplex', 'qcFlagsCellArea', 'median_negprobes', 'negprobes_quantile_0.9', 'median_RNA', 'RNA_quantile_0.9', 'nCell', 'nCount', 'nCountPerCell', 'nFeaturePerCell', 'propNegativeCellAvg', 'complexityCellAvg', 'errorCtPerCellEstimate', 'percOfDataFromErrorPerCell', 'qcFlagsFOV', 'cellType', 'niche', 'i.median_negprobes', 'i.negprobes_quantile_0.9', 'i.median_RNA', 'i.RNA_quantile_0.9'
##     obsm: 'spatial'

2.1.3 TileDB Documentation

Other documentation for accessing TileDB objects can be found here:

2.2 Seurat Object data

2.2.1 Reading In Object

Because the TileDBsc native to_seurat function does not attach data from slots like uns we also export data in a native Seurat format. The AtoMx SIP export function has a Seurat converter that attaches all results to Seurat object if FullSeuratObject == TRUE, and future releases will be focused on expanding compatibility of TileDB data with Seurat.

If FullSeuratObject == FALSE, the object will contain count data, metadata, and dimension reduction results. Here we will walk through the resulting full Seurat object.

Working off of S3 paths are a little trickier than using local files. Here is an example of how to read in an RDS file from S3. If you are using local data you can simply use readRDS().

fullSeuratPath <- "LiverDataReleaseSeurat.RDS"
bucket <- strsplit(fullSeuratPath,"/")[[1L]][3L]
file_key <- strsplit(fullSeuratPath, paste0(bucket,"/"))[[1L]][2L]
system2(command = "aws", arg = c("s3api", "get-object","--bucket",bucket,
                                 "--key", file_key,paste0("/tmp/tmp.RDS")),
        stdout = FALSE)
fullSeurat <- readRDS("/tmp/tmp.RDS")
rm("/tmp/tmp.RDS")

# fullSeurat <- readRDS(fullSeuratPath)
fullSeurat
## An object of class Seurat 
## 1197 features across 793318 samples within 3 assays 
## Active assay: RNA (1000 features, 0 variable features)
##  2 other assays present: falsecode, negprobes
##  3 dimensional reductions calculated: pca, approximateumap, approximateUMAP_bySlide

While a Seurat object can be easier to use, please be aware that reading an entire dataset into memory can be RAM exhaustive. This full Seurat object, containing all of the count data, metadata, and analysis results data are 50.03 GB in size.

2.2.1.1 Cell-by-target counts

2.2.1.1.1 Raw Counts
fullSeurat[["RNA"]]@counts[1:4,1:4]
## 4 x 4 sparse Matrix of class "dgCMatrix"
##       c_1_100_10 c_1_100_1078 c_1_100_1135 c_1_100_267
## AATK           .            1            .           .
## ABL1           .            2            .           .
## ABL2           .            .            .           .
## ACACB          .            2            .           1
2.2.1.1.2 Normalized Counts

These data were normalized using Pearson Residuals to account for library size factors to ensure that cell specific total transcript abundance and distribution of counts, which may vary some between FOVs and samples. See Normalization Section for more details.

One major difference between our custom toSeurat function is that the normalized data are in the expected location for later releases. It is located in the RNA assay under the data slot instead of in its own assay.

fullSeurat[["RNA"]]@data[1:4,1:4]
## 4 x 4 sparse Matrix of class "dgCMatrix"
##        c_1_100_10 c_1_100_1078 c_1_100_1135 c_1_100_267
## AATK  -0.06889529    1.5266149   -0.1577291  -0.3114469
## ABL1  -0.09509172    2.2435234   -0.2176835  -0.4296907
## ABL2  -0.08039924   -0.5759492   -0.1840596  -0.3633908
## ACACB -0.14233400    0.9318722   -0.3257528   0.9076156

2.2.1.2 Cell Metadata

fullSeurat@meta.data[1:4,1:10]
##              orig.ident nCount_RNA nFeature_RNA nCount_negprobes
## c_1_100_10            c         33           26                0
## c_1_100_1078          c       1699          295                0
## c_1_100_1135          c        173          131                1
## c_1_100_267           c        675          204                2
##              nFeature_negprobes RNA_pca_cluster_default
## c_1_100_10                    0                      13
## c_1_100_1078                  0                       7
## c_1_100_1135                  1                      12
## c_1_100_267                   2                      15
##              RNA_pca_cluster_default.1 nCount_falsecode nFeature_falsecode fov
## c_1_100_10                          15                5                  5 100
## c_1_100_1078                        11                7                  7 100
## c_1_100_1135                        10                9                  8 100
## c_1_100_267                         17                3                  3 100

2.2.1.3 Target Transcript Coordinates

head(fullSeurat@misc$transcriptCoords)
##   slideID fov x_FOV_px y_FOV_px z_FOV_slice       target CellId     cell_id
## 1       2  77      212        9           2  FalseCode23    460  c_2_77_460
## 2       2  58      565        9           6 FalseCode197    503  c_2_58_503
## 3       2 301     1160        9           4 FalseCode197    438 c_2_301_438
## 4       1   8     2183        9           7  FalseCode94     11    c_1_8_11
## 5       2  34     2364        9           0 FalseCode184      7    c_2_34_7
## 6       2  26     2467        9           3 FalseCode199      8    c_2_26_8
##    CellComp
## 1 Cytoplasm
## 2   Nuclear
## 3 Cytoplasm
## 4 Cytoplasm
## 5  Membrane
## 6 Cytoplasm

In SMIDA v1.3, transcript coordinates in the Seurat object will be added to the slot to allow for easy integration to Seurat vignettes.

2.2.1.4 Analysis Results

See Analysis Results Section for full description of module that can be run in AtoMx SIP. Many of the results names from an AtoMx export will have a globally unique identifier (GUID) attached to their name.

slot type of data results
meta.data cell metadata
graphs nearest neighbor graphs graph_RNA_pca_snn, graph_RNA_pca_nn, spatial_distances, RNA_spatialDE_W
reductions DimReduc objects pca, approximateumap, approximateUMAP_bySlide
misc additional data on cells transcriptCoords, latest.fovs, PathwayCellsAUC, RNA_nbclust_loglikes, RNA_normalized_cluster_normExpr, cellScores, RNA_normalized_markers_result, RNA_normalized_top_markers, spatEnrichRes, cellTypeScores, cellTypePairs, linkScores, cellProximity_cellproximity_heatmap, cellProximity_cellproximity_pcf, cellProximity_cellproximity_which
assay@misc additional data on target RNA_LeesLRes, RNA_nbclust_log_likelyhood_profiles
assay@meta.features target metadata

2.2.2 Seurat Documentation

2.3 Comparison of Methods

In this section we will demonstrate how to perform several common actions using TileDB in R and python, as well as with Seurat in R.

2.3.1 UMAP

This section shows how to plot a UMAP colored by the cell type metadata column. We create a simple plotting function to be used in multiple sections below (plotting).

colorColumn = "cellType"

Read in UMAP from TileDB R

umap_TileDB <- as.data.frame(tiledb_scdataset$somas$RNA$obsm$members$dimreduction_approximateumap$to_matrix())
umap_TileDB$colorBy <- tiledb_scdataset$somas$RNA$obs$to_dataframe(attrs = colorColumn)[[1L]]

Read in UMAP from TileDB python

umap_py = pySoma['RNA'].obsm['dimreduction_approximateumap'].df()
metadata = pySoma['RNA'].obs.df()
umap_py["colorBy"] = metadata[r.colorColumn]

Read in UMAP from Seurat

umap_Seurat <- as.data.frame(fullSeurat@reductions$approximateumap@cell.embeddings)
umap_Seurat$colorBy <- fullSeurat[[colorColumn]][match(rownames(umap_Seurat), 
                                                       rownames(fullSeurat@meta.data)), 1]
colrs <- brewer.pal.info[brewer.pal.info$colorblind == TRUE, ]
colorSchemes <- c("PuOr", "Dark2", "Set2", "BrBG")
colrs <- colrs[colorSchemes,]
col_vec <- unique(unlist(mapply(brewer.pal, colrs$maxcolors, colorSchemes)))

col_vec <- col_vec[-grep(col_vec, pattern = "#F|#E|#D")]

plotting <- function(data, title, Xcol, Ycol, Xname, Yname, color, 
                     colorName, size = 0.02, alpha = 0.05){
  gp <- ggplot(data, aes(x = .data[[Xcol]], y = .data[[Ycol]], 
                         color = .data[[color]]))+
    geom_point(size = size, alpha = alpha)+
    coord_equal()+
    labs(title = title, color = colorName,
         x = Xname, y = Yname)+
    scale_color_manual(values = col_vec)+
    guides(colour = guide_legend(override.aes = list(size=1,
                                                     alpha=1),
                                 ncol = 1))+
    theme(legend.text=element_text(size=12))
  
  return(gp)
}

umapPlot <- function(data, title, colorBy = "colorBy", colorColumn = "Cell Type", ...){
  return(plotting(data, title, Xcol = "APPROXIMATEUMAP_1", 
                  Ycol = "APPROXIMATEUMAP_2", Xname = "UMAP1", 
                  Yname = "UMAP2", color = colorBy, colorName = colorColumn,
                  ...))
}

pcaPlot <- function(data, title, colorBy = "colorBy", colorColumn = "Cell Type", ...){
  return(plotting(data, title, Xcol = "PCA_1", 
                  Ycol = "PCA_2", Xname = "PCA_1", 
                  Yname = "PCA_2", color = colorBy, colorName = colorColumn,
                  ...))
}

xySlidePlot <- function(data, title, colorBy = "colorBy", colorColumn = "Cell Type", ...){
  return(plotting(data, title, Xcol = "x_slide_mm", 
                  Ycol = "y_slide_mm", Xname = "x_slide_mm", 
                  Yname = "y_slide_mm", color = colorBy, colorName = colorColumn,
                  ...))
}

All of the UMAPs are the same data regardless of how you read in the data.

Readin method Time Memory
TileDB R 4.991 seconds 22.31 MB
TileDB python 5.623 seconds 19.04 MB
Full Seurat 14.199 minutes 50.03 GB
umapGP_TileDB <- umapPlot(umap_TileDB, title = "TileDB - R") 
umapGP_python <- umapPlot(reticulate::py$umap_py, title = "TileDB - python") 
umapGP_Seurat <- umapPlot(umap_Seurat, title = "Seurat") 

ggpubr::ggarrange(umapGP_TileDB,umapGP_python, umapGP_Seurat,
          common.legend = TRUE, legend = "right", ncol = 1)

3 NanoString Analysis Pipeline

These are examples of current and future analysis modules available in AtoMx SIP. They provide analytical methods for adding results to a TileDB study. Please note that each study may only use a subset of these modules, and that users can create their own analysis modules which may have different formats and data specifications. Each section below shows how to work with each of the resulting features in the TileDB and Seurat objects (only available in v1.1 of AtoMx export). Manual can be found here.

Any module that was run before export, will be available in the TileDB and Seurat, when FullSeuratObject == TRUE. If FullSeuratObject == FALSE, the object will contain count data, metadata, and dimension reduction results. All modules can be run with either RNA and protein unless otherwise stated.

The results names generated by CosMx SMI Data Analysis differ slightly from the names in the liver dataset described here. Most results exported will have a GUID as part of the result name.

3.1 Study Creation

  • Data are loaded into TileDB array from flat files. Cells are globally aligned for downstream spatial analyses.
  • Output:
    • Cell metadata:
      • TileDB: tiledb_scdataset$somas$RNA$obs
      • Seurat: seurat@meta.data
    • Cell count matrix:
      • TileDB: tiledb_scdataset$somas$RNA$X$members$counts
      • Seurat: seurat[["RNA"]]
    • Cell transcript coordinates:
      • TileDB: tiledb_scdataset$somas$RNA$obsm$memebers$transcriptCoords
      • Seurat: seurat@misc$transcriptCoords
    • Target metadata:
      • TileDB: tiledb_scdataset$somas$RNA$var
      • Seurat: seurat[["RNA"]]@meta.features

3.2 Spatial Network

  • Create a network or graph structure of the physical distribution of cells. Cells are converted to nodes in the graph, and connections between cells (e.g. nearest neighbors) are represented as edges. Network can be built in one of three ways: radius-based (all cells connected within a given radius), nearest neighbors, or Delaunay triangulation.
  • Output:
    • TileDB: tiledb_scdataset$somas$RNA$obsp$members$spatialdistances_[GUID]
    • Seurat: seurat@graphs$spatialdistances_[GUID]

3.3 Quality Control

  • RNA: This module is to flag unreliable negative probes, cells, FOVs, and target genes. The user can choose to remove those flagged negative probes, cells, target genes, and FOVs and generate a filtered dataset which is the input of the down-stream analyses.
    • Negative Probe QC: can be used to remove negative control probes which appear to be behaving like outliers in your tissue. This helps prevent tissue-specific or sample-specific background effects from impacting other QC metrics.

    • Cell level QC: looks for cells which specific characteristics and removes them. These characteristics include: number of targets detected (higher is better), fraction of probes which are negative controls (lower is better), uniformity of count distribution (higher complexity is preferred), cell size (top % of cell sizes may need to be removed as a QC of segmentation). You may also manually provide a annotation to filter on if there is a cluster of cells you would like removed later in analysis.

    • FOV QC: identifies FOVs which have generally low expression. There are two approaches available. The “mean” method involves filtering out FOVs where the total count per cell averages below a threshold (default: 100). The “quantile” method involves filtering out FOVs where a specified quantile (default: 90th percentile) of gene expression is not above the median of the negative probes.

    • Target level QC: checks to see if a target appears to be above background across your dataset based on probe distribution relative to negative control probes included in the assay.

  • Protein: This module is to flag unreliable cells based on segmented cell area, negative probe expression, and high/low target expression. Specifically cells with outlier Grubb’s test p-values (p<0.05) for segmented area are flagged. Cells with mean negative probe values below the lower threshold (default = 2) or above the upper threshold (default = 50) are flagged. Cells where at least a% of proteins in the bth quantile (default a = 0.5, b = 0.9) are considered cells with overly high target expression, and therefore flagged. Cells where fewer than c proteins are in at least the dth quantile (default c = 3, d = 0.5) are considered cells with low expression, and therefore flagged.
  • Output:
    • Failing cells are flagged in cell metadata
      • colnames may change due to changes in QC variables
      • RNA colnames = c(“propNegative”, “complexity”, “errorCtEstimate”, “percOfDataFromError”, “qcFlagsRNACounts”, “qcFlagsCellCounts”, “qcFlagsCellPropNeg”, “qcFlagsCellComplex”, ” qcFlagsCellArea”, “median_negprobes”, “negprobes_quantile_0.9”, “median_RNA”, “RNA_quantile_0.9”, “nCell”, “nCount”, “nCountPerCell”, “nFeaturePerCell”, “propNegativeCellAvg”, ” complexityCellAvg”, “errorCtPerCellEstimate”, “percOfDataFromErrorPerCell”, “qcFlagsFOV”)
      • Protein colnames = c(“area.qc”, “negprobe.qc”, “high.express.qc”, “low.express.qc”)
    • Failing targets are flagged in target metadata
      • colnames may change due to changes in QC variables
      • RNA colnames = c(“backgroundTestpVals”, “qcFlagsBackground”, “qcFlagsQuantileSummary”)

3.4 Normalization

  • RNA: Normalization is based on the concept of adjusting for library size factors to ensure that cell specific total transcript abundance and distribution of counts, which may vary some between FOVs and, more dramatically, between samples, does not influence downstream visualization and analysis of data. There are two normalization methods available:
    • Seurat (default) uses Seurat::NormalizeData() with their “LogNormalize” default setting
    • Pearson’s residual normalization is based on the estimated mean and variance: (raw counts - mean)/sd
    • The alternative normalization method is based on total count size factors: raw counts/total counts
  • Protein: Normalization is based on the concepts of
    • total intensity to reduce the effect of technical artifacts (e.g. shading/edge effects)
    • arcsinh transformation to improve visualization clarity and stabilize variance
  • Output:
    • TileDB: tiledb_scdataset$somas$RNA_normalized_[GUID]$X$members$data
    • Seurat: seurat[["RNA_normalized_[GUID]"]]@data

3.5 Principal Component Analysis (PCA)

  • Principal component analysis (PCA) provides an orthogonally constrained dimensional reduction analysis of the count data across all cells in a dataset. It produces an output values (PCs) which represent axes of variation within the data which are a combined value of weighted expression within a given cell. PCs are ordered by decreasing variation explained in the data. These can be used to better understand variation within a dataset, but is most commonly used in single-cell analysis as an input for the UMAP analysis.
  • Output:
    • TileDB: tiledb_scdataset$somas$RNA$obsm$members$dimreduction_approximatepca_[GUID]
    • Seurat: seurat@reductions$approximatepca_[GUID]
pca <- as.data.frame(tiledb_scdataset$somas$RNA$obsm$members$dimreduction_pca$to_matrix())
pca$colorBy <- tiledb_scdataset$somas$RNA$obs$to_dataframe(attrs = colorColumn)[[1L]]

pcaPlot(pca, "PCA")

3.6 Uniform Manifold Approximation and Projection (UMAP)

  • UMAP provides a means to visualize high-plex complex datasets in 2-dimensional space using a non-linear approach estimating related groups of cells or features. This method is a common way of visualizing single-cell data to identify clusters of related cells which may be from the same lineage.
  • Output:
    • TileDB: tiledb_scdataset$somas$RNA$obsm$members$dimreduction_approximateumap_[GUID]
    • Seurat: seurat@reductions$approximateumap_[GUID]
umap_TileDB <- as.data.frame(tiledb_scdataset$somas$RNA$obsm$members$dimreduction_approximateumap$to_matrix())
umap_TileDB$colorBy <- tiledb_scdataset$somas$RNA$obs$to_dataframe(attrs = colorColumn)[[1L]]
umapPlot(umap_TileDB, "TileDB - R", size = 0.2)

3.7 InSituType (RNA cell typing)

  • A cell typing algorithm designed for statistical and computational efficiency in spatial transcriptomics data. Insitutype is based on a likelihood model that weighs the evidence from every expression value, extracting all the information available in each cell’s expression profile. This likelihood model underlies a Bayes classifier for supervised cell typing, and an Expectation-Maximization algorithm for unsupervised and semi-supervised clustering. Insitutype also leverages alternative data types collected in spatial studies, such as cell images and spatial context, by using them to inform prior probabilities of cell type calls.
  • Output:
    • Added metadata column with cell type & target log likelyhood profiles
      • colnames = c(“RNA_nbclust_[GUID]_clusters”, “RNA_nbclust_[GUID]_posterior_probability”)
    • TileDB: tiledb_scdataset$somas$RNA$varp$members$RNA_nbclust_[GUID]_log_likelyhood_profiles
    • Seurat: seurat[["RNA"]]@misc$RNA_nbclust_[GUID]_log_likelyhood_profiles
  • Pre-print
  • For this dataset, we used the Liver_HCA profile matrix generated using Human Cell Atlas data. MacParland, S. A. et al. Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations. Nat Commun 9, 4383 (2018).
slideMetadata <- metadata[metadata$Run_Tissue_name == slideName & 
                            metadata$fov %in% c(1:5, 22:26, 43:47),]

xySlidePlot(data = slideMetadata, title = "Cell Types in Space",
            colorBy = "cellType", size = 0.3, alpha = 0.5)

3.8 CELESTA (Protein cell typing)

  • The CELESTA algorithm performs cell typing by taking into account each cell’s marker expression profile and, if necessary, spatial information. Cell typing calls are guided by a signature matrix which specifies the marker(s) known to have high/low expression for each cell type. A bimodal Gaussian mixture model can then be fit to estimate the probability of each cell having “high expression” for each considered marker. When the probability is sufficiently high, a cell is considered an “anchor cell”. When the probability is not sufficiently high to make a high certainty cell type call, the algorithm also considers spatial information by taking into account the cell type calls of neighboring cells. These are considered “index cells”.
  • Output:
    • Added metadata column with cell type
      • colnames = c(“[GUID]_final_cell_type”, “[GUID]_round_1”, “[GUID]_round_2”, “[GUID]_round_3”, “[GUID]_cell_type_number”)
    • TileDB: tiledb_scdataset$somas$RNA$uns$members$[GUID]_celesta.*
    • Seurat: seurat@misc$[GUID]_celesta.* Many slots that contain [GUID]_celesta
  • Paper: Zhang, W., Li, I., Reticker-Flynn, N.E. et al. Identification of cell types in multiplexed in situ images by combining protein expression and spatial information using CELESTA. Nat Methods 19, 759–769 (2022).

3.9 Nearest Neighbor

  • Construct a KNN (k-Nearest Neighbor) graph default is based on the eucledian distance in PCA space and then construct the SNN (Shared Nearest Network) graph with edge weights between any two cells based on the shared overlap in their local neighborhoods (Jaccard distance) and pruning of distant edges.
  • Output:
    • TileDB: tiledb_scdataset$somas$RNA$obsp$members$graph_nn_[GUID]_nn & tiledb_scdataset$somas$RNA$obsp$members$graph_nn_[GUID]_snn
    • Seurat: seurat@graphs$graph_nn_[GUID]_nn & seurat@graphs$graph_nn_[GUID]_snn

3.10 Leiden Clustering

  • Leiden clustering is an unsupervised clustering method that is used to identify groups of cells which are related based on how similar they are in a graph structure. Clusters are defined by moving cells to identify groups of cells that can be aggregated without changing the overall relationship of the graph and looking for unstable nodes which serve as bridges between related communities to help define the boundaries of different clusters.
  • Output:
    • Added metadata column with cluster
      • colnames may change due to changes in clustering variables
      • colnames = “nn_[GUID]_cluster_cluster_[GUID]”
xySlidePlot(data = slideMetadata, title = "Cell Clusters in Space",
            colorBy = "RNA_pca_cluster_default",
            size = 0.3, alpha = 0.5, colorColumn = "Cell Cluster")

3.11 Neighborhood Analysis

  • Identify distinct cellular neighborhood clusters based on cell type composition across tissue. This module helps define the structural composition of a tissue automatically by looking for regional differences in cell type composition. Structures can be repeated structures that are frequently found within a tissue but which are not contiguous (e.g. glomeruli in the kidney, germinal centers in the lymph node) or which are physically connected across a tissue (e.g. epithelial layer in the colon).
  • Output:
    • added metadata columns with niche and neighborhood matrix
      • colnames = c(“RNA_spatialclust_[GUID]_neighbors”, “spatialclust_[GUID]_assignments”)
  • Niches must be named by user, NanoString’s pathologist named these niches.
xySlidePlot(data = slideMetadata, title = "Cell Neighborhoods in Space",
            colorBy = "niche",
            size = 0.3, alpha = 0.5, colorColumn = "Cell Niche")

3.12 Spatial Expression Analysis

  • This module is used to identify genes which have a spatial distribution that is non-uniform/autocorrelated throughout a tissue, and which may be associated with specific tissue structures, microenvironment niches, or cell types. Use this module to look for genes which carry spatially-relevant information. The module also measures associated spatial expression between genes which can be used to group genes into different spatial expression patterns. The two statistics calculated related to spatial expression patterns are Moran’s I and Lee’s L. This module does not assume any specific relationship between structures in the tissue, and genes with significant values should be visualized to determine how they are related to the tissue morphology.
  • Output: Only LeesL outputs for protein datasets
    • var columns RNA_spatialDE_[GUID]_Moran_I, RNA_spatialDE_[GUID]_Moran_pval, RNA_spatialDE_[GUID]_Moran_expected_I, RNA_spatialDE_[GUID]_Moran_seI, & RNA_spatialDE_[GUID]_Moran_padj
    • TileDB: tiledb_scdataset$somas$RNA$obsp$members$spatialDE_[GUID]_W, tiledb_scdataset$somas$RNA$varp$members$RNA_spatialDE_[GUID]_LeesLRes, & tiledb_scdataset$somas$RNA$var
    • Seurat: seurat@graphs$spatialDE_[GUID]_W, seurat[["RNA"]]@misc$RNA_spatialDE_[GUID]_LeesLRes, & seurat[["RNA"]]@meta.features
LeesLRes <- tiledb_scdataset$members$RNA$varp$get_member("RNA_LeesLRes")$to_dataframe()

# reformat the names of genes (e.g. change 4-1BB to 4.1BB)
LeesLRes$var_id_i <- gsub('-', '.', LeesLRes$var_id_i)
LeesLRes$var_id_i <- gsub('/', '.', LeesLRes$var_id_i)
LeesLRes$var_id_j <- gsub('X4.1BB', '4.1BB', LeesLRes$var_id_j)

# convert to square matrix
namegenes <- sort(unique(unlist(LeesLRes[1:2])))
LeesLRes_matrix <- matrix(0, length(namegenes), length(namegenes),
                          dimnames = list(namegenes, namegenes))
LeesLRes_matrix[as.matrix(LeesLRes[c("var_id_i", "var_id_j")])] <-
  LeesLRes[["value"]]

# heatmap
leesHeatmap <- function(spat_lees_l, n_clust) {

  spat_l_hm <- pheatmap::pheatmap(as.matrix(spat_lees_l), scale="none",
                                  color=viridis::viridis(100, option="D"),
                                  border_color=NA, show_rownames=TRUE,
                                  cluster_rows=TRUE, cluster_cols=TRUE,
                                  cutree_rows=n_clust, cutree_cols=n_clust,
                                  main="Lee's L Spatial Association")

  return(spat_l_hm)
}

# leesHeatmap(LeesLRes_matrix, 10)

# we plot the top 20 genes with the largest magnitude off-diagonal values.

plot_n = 20
LeesLRes_off <- LeesLRes[LeesLRes$var_id_i != LeesLRes$var_id_j,]
LeesLRes_off <- LeesLRes_off[order(abs(LeesLRes_off$value), 
                                   decreasing = TRUE),]
plot_genes <- LeesLRes_off[!duplicated(LeesLRes_off$var_id_i),][1:plot_n,
                                                               "var_id_i"]
leesHeatmap(LeesLRes_matrix[plot_genes, plot_genes], 5)

3.13 Cell Type Co-localization

  • This module examines the tendency of different cell types to be located near each other. Each pair of cell types defined from supervised or unsupervised clustering is tested using Ripley’s K-function (a function of the distance between the different cell types) for whether the cells’ spatial distribution differs from a theoretical Poisson point process where a cell’s location is not dependent on another cell’s location. The results are summarized in a heatmap indicating which cell types tend to cluster together or isolate from each other. In addition, a more granular view is shown when plotting the pair correlation function for a given cell type pairing as a function of the radius which can reveal specific distances at which the cells of each type are co-localized.
  • Output:
    • TileDB: tiledb_scdataset$somas$RNA$uns$members$cellproximity_[GUID]
    • Seurat: seurat@misc$cellproximity_[GUID]_cellProximity_heatmap & seurat@misc$cellproximity_[GUID]_cellProximity_pcf & seurat@misc$cellproximity_[GUID]_cellProximity_which
cell_pairs <- tiledb::toMatrix(tiledb_scdataset$members$RNA$uns$members$cellProximity$list_object_uris()[1L])

# generate heatmap
pheatmap::pheatmap(cell_pairs, cluster_cols = TRUE, cluster_rows = TRUE,
                   breaks = seq(-2, 2, length.out = 100),
                   color = colorRampPalette(c("blue", "white", "red"))(100),
                   show_rownames = TRUE, show_colnames = TRUE)

3.14 Marker Genes

  • This module will identify marker genes associated with each cell type or cluster previously identified within a dataset. This module looks for genes which are expressed above background consistently, but also most specifically restricted to each cell type or cluster within the dataset. The module acts on each gene independently. This module may also be used to look for marker genes in neighborhoods that have been identified, but these genes will be related to the overall cellular composition of those neighborhoods.
  • Output:
    • TileDB: tiledb_scdataset$somas$RNA$obsm$members$markerID_[GUID]_cluster_normExpr & tiledb_scdataset$somas$RNA$obsm$members$markerID_[GUID]_markers_result & tiledb_scdataset$somas$RNA$obsm$members$markerID_[GUID]_top_markers
    • Seurat: seurat@misc$markerID_[GUID]_cluster_normExpr & seurat@misc$markerID_[GUID]_markers_result & seurat@misc$markerID_[GUID]_top_markers
# Extract gene X cluster matrix for normalized expression of selected markers
cluster_normExpr <- tiledb_scdataset$somas$RNA$obsm$members$RNA_normalized_cluster_normExpr$to_matrix()
rownames(cluster_normExpr) <- rownames(tiledb_scdataset$somas$RNA$var$to_dataframe())
topMarkers <- as.data.frame(tiledb_scdataset$somas$RNA$obsm$members$RNA_normalized_top_markers$to_matrix())
topMarkers <- topMarkers$feats[as.numeric(aggregate(topMarkers$final_rank, list(topMarkers$cluster), max)$x)]
cluster_normExpr <- cluster_normExpr[rownames(cluster_normExpr) %in%
                                       unique(topMarkers),]

# Heatmap
pheatmap::pheatmap(cluster_normExpr,
                   color = colorRampPalette(c("blue", "white", "red"))(100),
                   scale = 'row',
                   fontsize = 10, cellwidth = 10, border_color = NA,
                   show_rownames = TRUE, show_colnames = TRUE)

3.15 Ligand-Receptor Analysis (RNA only)

  • Score pairs of cells and individual cells for ligand-receptor signaling, and explore results. Ligand-receptor target expression in adjacent cells is used to calculate a co-expression score. A test is then performed to determine if the overall average of the scores for each ligated-receptor pair is enriched by the spatial arrangement of cells.
  • Output:
    • TileDB: tiledb_scdataset$somas$RNA$uns$members$[GUID]_spatEnrichRes & tiledb_scdataset$somas$RNA$uns$members$[GUID]_cellTypePairs & tiledb_scdataset$somas$RNA$uns$members$[GUID]_linkScores& tiledb_scdataset$somas$RNA$uns$members$[GUID]_cellTypeScores & tiledb_scdataset$somas$RNA$obsm$members$[GUID]_cellScores
    • Seurat: seurat@misc$[GUID]_spatEnrichRes & seurat@misc$[GUID]_cellTypePairs & seurat@misc$[GUID]_linkScores & seurat@misc$[GUID]_cellTypeScores & seurat@misc$[GUID]_cellScores
## Retrieval of results
# For cellScores result from obsm folder
cellScores <- tiledb_scdataset$somas$RNA$obsm$members$cellScores$to_matrix()

# For LR results from UNS
uri_use <- tiledb_scdataset$members$RNA$uns$members$cellTypeScores$uri
cellTypeScores <- tiledb::tiledb_array(uri = uri_use, return_as = "data.table")[]

uri_use <- tiledb_scdataset$members$RNA$uns$members$cellTypePairs$uri
cellTypePairs <- tiledb::tiledb_array(uri = uri_use, return_as = "data.table")[]
cellTypePairs <- cellTypePairs[,-1]
cellTypePairs <- as.matrix(cellTypePairs[,-c("fromCT", "toCT"),with=FALSE], 
                          rownames = cellTypePairs[,paste0(fromCT, "_", toCT)])

pheatmap::pheatmap(cellTypePairs[head(order(matrixStats::rowVars(cellTypePairs),
                                            decreasing = TRUE), 25),
                                 head(order(matrixStats::colVars(cellTypePairs),
                                            decreasing = TRUE), 10)])

uri_use <- tiledb_scdataset$members$RNA$uns$members$spatEnrichRes$uri
spatEnrichRes <- tiledb::tiledb_array(uri = uri_use, return_as = "data.table")[]
maxPadj <- 0.05 #significance level
spatEnrichRes$PadjSig <- spatEnrichRes$padj < maxPadj
spatEnrichRes$lrname <- sapply(1:nrow(spatEnrichRes),
                               function(i){paste0(spatEnrichRes$Ligand[i], "/",
                                                  spatEnrichRes$Receptor[i])})
spatEnrichRes$intScoreRatio <- spatEnrichRes$intScore/spatEnrichRes$intScore_simAvg

ggplot(spatEnrichRes, aes(x=intScoreRatio, y=-log10(padj), color=PadjSig)) +
  geom_point(size=2) +
  labs(x = "L-R Interaction Score Ratio",
       y = "-log10(Padj)",
       color = "Significant",
       title = paste0(" Padj < ", maxPadj)) +
  geom_line(aes(y=-log10(maxPadj), color = "black"),linetype="dashed") + 
  scale_color_manual(values = c("TRUE" = "red", "FALSE" = "black")) +
  geom_text_repel(aes(label=lrname)) +
  theme_bw()

3.16 Signaling Pathways (RNA only)

  • Signaling Pathway analysis is calculated on a per-cell basis using gene sets of pre-defined pathways. The method we use calculated relative enrichment of a pathway of the genes within each pathway. Gene sets which do not have sufficient coverage are excluded from analysis.
  • Output:
    • TileDB: tiledb_scdataset$somas$RNA$obsm$members$PathwayCellsAUC_[GUID]
    • Seurat: seurat@misc$PathwayCellsAUC_[GUID]
pathway <- tiledb_scdataset$members$RNA$obsm$members$PathwayCellsAUC$to_matrix()
maxPath <- order(matrixStats::colSds(pathway[rownames(pathway) %in%
                                                slideMetadata$cell_id,]),
                 decreasing = TRUE)[1L]

pathwayMeta <- cbind(metadata, umap_TileDB, maxPathway=pathway[,maxPath])

lims <- range(pathway[,maxPath])
maxPathName <- colnames(pathway)[maxPath]
plot_umap_pathway <- ggplot(pathwayMeta, aes(x = APPROXIMATEUMAP_1 ,
                                            y = APPROXIMATEUMAP_2,
                                            color = maxPathway)) +
  geom_point(size = 0.3, alpha = 0.3) +
  theme_bw() +
  labs(color = maxPathName, title = "UMAP") +
  viridis::scale_color_viridis(limits = lims, option = "A")

plot_xy_pathway <- ggplot(pathwayMeta[rownames(pathwayMeta) %in%
                                        rownames(slideMetadata),],
                         aes(x_slide_mm, y_slide_mm)) +
  geom_point(aes(color = maxPathway), size = 0.2, alpha = 0.3) +
  coord_fixed() +
  viridis::scale_color_viridis(limits = lims, option = "A") +
  labs(color = maxPathName,
       title = paste("Pathway Spatial Plot")) +
  theme_bw()

(plot_umap_pathway | plot_xy_pathway ) +
  plot_layout(guides = "collect", nrow = 2)

3.17 Differential Expression (RNA only)

  • Estimates and summarizes generalized linear (mixed) models for single cell expression. We control for the expression of neighboring cells on a tissue by including ‘neighboring cell expression’ of the analyzed gene as a fixed-effect control variable in the DE model. Controlling for expression in neighboring cells is motivated by the observation that cells in close proximity on a tissue are not independent, and comparisons of DE between groups may be affected by cell-segmentation and cell-type uncertainty. Often, single cell DE analyses may test whether groups are DE within a specific cell-type. In practice, imperfect cell-segmentation can result in ‘contamination’ or ‘bleed-over’ of transcripts from neighboring cells, which can also increase uncertainty in downstream cell-typing analyses. For these reasons, including the expression of the analyzed gene in neighboring cells can be a useful control variable.
  • Output:
    • TileDB: tiledb_scdataset$somas$RNA_normalized$uns$members$[GUID]_prede_* & tiledb_scdataset$somas$RNA_normalized$uns$members$[GUID]_deresults_*
    • Seurat: seurat@misc$[GUID]_preDE... & seurat@misc$[GUID]_de... Many slots that start with preDE and de.
quantile_breaks <- function(xs, n = 10) {
  breaks <- quantile(xs, probs = seq(0, 1, length.out = n))
  breaks[!duplicated(breaks)]
}
all_de_results <- grep("deresults",
                      tiledb_scdataset$members$RNA_normalized$uns$list_member_uris(),
                      value = TRUE)

comparison <- "one.vs.rest" #select comparison
one.vs.rest <- tiledb::tiledb_array(
        uri =grep(comparison, all_de_results, value = TRUE),
        return_as = "data.table")[]

one.vs.rest <- one.vs.rest[which(one.vs.rest$p.value < 0.05),]

one.vs.rest_fc <- reshape2::acast(one.vs.rest, target~contrast,
                                  value.var = "fold_change")
one.vs.rest_fc[is.na(one.vs.rest_fc)] <- 0

one.vs.rest$log10p <- -log10(one.vs.rest$p.value)
one.vs.rest_p <- reshape2::acast(one.vs.rest, target~contrast,
                                 value.var = "log10p")
one.vs.rest_p[is.na(one.vs.rest_p)] <- 0

plot_list <- list()

mat_breaks <- quantile_breaks(one.vs.rest_fc, n = 11)
plot_list[[1L]] <- pheatmap::pheatmap(one.vs.rest_fc,
                   color = colorRampPalette(c("white", "blue"))(length(mat_breaks)-1),
                   fontsize = 10, cellwidth = 15, border_color = NA,
                   show_rownames = TRUE, show_colnames = TRUE,
                   main = "Fold Change", cluster_rows = FALSE,
                   silent = TRUE, breaks = mat_breaks)[[4L]]

plot_list[[2L]] <- pheatmap::pheatmap(one.vs.rest_p,
                   color = colorRampPalette(c("white", "red"))(length(mat_breaks)-1),
                   fontsize = 10, cellwidth = 15, border_color = NA,
                   show_rownames = TRUE, show_colnames = TRUE,
                   main = "-log10(p value)", cluster_rows = FALSE,
                   silent = TRUE, breaks = mat_breaks)[[4L]]

do.call("grid.arrange", c(plot_list, ncol =2))

4 Raw Files Folder Structure

4.1 Most Important Files for Analysis

If Raw Files are exported, they will follow this format. It is recommended to only export Raw Files once per study. If multiple pipelines from the same study export Raw File there could be TBs of duplicated files. For this study, the zipped raw files are over 800 GB. These files are not zipped automatically in the export module.

We will continue to refine and update files for export. In future releases, some of these files may be removed to cut down on exported file size.

4.1.1 RNA

Flowcell Folder/
      Logs/
              SpatialProfiling_[sequence_name].fovs
                      - FOV Coordinates for study, slide number matches SlideNum in next folder
      [sequence_name]_S[slot]/
              plex_[processing_id].csv
                      - Target probes metadata
              CellStatsDir/
                      CellComposite/
                              CellComposite_FOV[FOV].jpg
                                      - Composite Colored Image for each FOV
                      CellOverlay/
                              CellOverlay_[FOV].jpg
                                      - B&W image with cell segmentation overlays for each FOV
                      FOV[FOV]/
                              CellLabels_F[FOV].TIF
                                      - cell segmentation labels
                                      - pixel intensity values correspond with the unique cell_id
                              CompartmentLabels_F[FOV].TIF
                                      - cell subcellular compartment labels
                                      - pixel intensity values correspond with the identified compartment label
                                      - Nuclear = 1, Membrane = 2, Cytoplasmic = 3
                              [run_name]_[sequence_name]_S[slot]_Cell_Stats_F[FOV].csv
                                      - cell info from image like area, centerX,Y coordinates and fluorescence values
                      Morphology2D/
                              [sequence_name]_S[slot]_C[cycle]_P[pool]_N[spot]_F[FOV].TIF
                                      - TIFF of fluorescence values
                      Morphology3D/
                              FOV[FOV]
                                      [sequence_name]_S[slot]_C[cycle]_P[pool]_N[spot]_F[FOV]_Z[z].TIF
                                              - stacked TIFF of fluorescence values at each Z plane
                      RnD/
                              Run_[GUID]_[sequence_name]_S[slot]_Summary_F[FOV].csv
                                      - FOV summary statistics
              RunSummary/
                              Run_[GUID]_[date]_S[slot]_[instrument_name]_ExptConfig.txt
                                      - Instrument Config - includes the pixel to nm ratio
              AnalysisResults/[processing_id]/
                      FOV[FOV]/
                              FOV[FOV]_Analysis_Summary.txt
                                      - Limits of Detection
                              Run_[GUID]_FOV[FOV]__complete_code_cell_target_call_coord.csv
                                      - Target coordinates and counts per cell
                                      - other coord files are intermediate files

4.1.2 Protein

Flowcell Folder/
      Logs/
              SpatialProfiling_[sequence_name].fovs
                      - FOV Coordinates for study, slide number matches SlideNum in next folder
      [Flowcell]_[SlideNum]/
              plex_[processing_id].csv
                      - Target probes metadata
              CellStatsDir/
                      CellComposite/
                              CellComposite_FOV[FOV].jpg
                                      - Composite Colored Image for each FOV
                      CellOverlay/
                              CellOverlay_[FOV].jpg
                                      - B&W image with cell segmentation overlays for each FOV
                      FOV[FOV]/
                              CellLabels_F[FOV].TIF
                                      - cell segmentation labels
                                      - pixel intensity values correspond with the unique cell_id
                              CompartmentLabels_F[FOV].TIF
                                      - cell subcellular compartment labels
                                      - pixel intensity values correspond with the identified compartment label
                                      - Nuclear = 1, Membrane = 2, Cytoplasmic = 3
                              [run_name]_[sequence_name]_S[slot]_Cell_Stats_F[FOV].csv
                                      - cell info from image like area, centerX,Y coordinates and fluorescence values
                      Morphology2D/
                              [sequence_name]_S[slot]_C[cycle]_P[pool]_N[spot]_F[FOV].TIF
                                      - TIFF of fluorescence values
                      Morphology3D/
                              FOV[FOV]
                                      [sequence_name]_S[slot]_C[cycle]_P[pool]_N[spot]_F[FOV]_Z[z].TIF
                                              - stacked TIFF of fluorescence values at each Z plane
                      RnD/
                              Run_[GUID]_[sequence_name]_S[slot]_Summary_F[FOV].csv
                                      - FOV summary statistics
              RunSummary/
                              Run_[GUID]_[date]_S[slot]_[instrument_name]_ExptConfig.txt
                                      - Instrument Config - includes the pixel to nm ratio
              ProteinDir/
                      FOV[FOV]/
                              - Files are used to generate ProteinImages, ProteinMasks, & Protein Stats in AnalysisResults/ folder
              AnalysisResults/[processing_id]/
                      FOV[FOV]/
                              PerCellStats/
                                      [sequence_name]_S[slot]C001_F[FOV][probe_id]_perCell_1ChStats.csv
                                              - Counts per probe_id per cell
                                              - 1 file per protein in panel
                                              - Sum Fluorescence is considered count
                              ProteinImages/
                                      [sequence_name]_S[slot]C001_F[FOV][probe_id].TIF
                                              - TIFF with decoded fluorescence intensity of protein target across FOV
                              ProteinMasks/
                                      [sequence_name]_S[slot]C001_F[FOV][probe_id]_Mask.TIF
                                              - TIFF displaying area of target
                                              - 255 = present, 0 = absent

If you have any questions please reach out to your Applications Scientist or NanoString Support:

For Research Use Only. Not for use in diagnostic procedures.

sessionInfo()
## R version 4.2.3 (2023-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] rminiconda_0.0.1    RcppSpdlog_0.0.12   tiledbsc_0.1.4.9001
##  [4] tiledb_0.17.0       patchwork_1.1.2     pheatmap_1.0.12    
##  [7] reshape2_1.4.4      reticulate_1.28     remotes_2.4.2      
## [10] RColorBrewer_1.1-3  ggpubr_0.6.0        matrixStats_0.63.0 
## [13] sp_1.6-0            SeuratObject_4.1.0  Seurat_4.1.1       
## [16] ggrepel_0.9.3       gridExtra_2.3       ggplot2_3.4.1      
## [19] pryr_0.1.6          BiocStyle_2.26.0   
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.4.1       plyr_1.8.8            igraph_1.4.1         
##   [4] lazyeval_0.2.2        splines_4.2.3         listenv_0.9.0        
##   [7] scattermore_0.8       urltools_1.7.3        digest_0.6.31        
##  [10] htmltools_0.5.4       viridis_0.6.2         fansi_1.0.4          
##  [13] magrittr_2.0.3        tensor_1.5            cluster_2.1.4        
##  [16] ROCR_1.0-11           globals_0.16.2        spatstat.sparse_3.0-1
##  [19] colorspace_2.1-0      lobstr_1.1.2          xfun_0.39            
##  [22] dplyr_1.1.0           jsonlite_1.8.4        progressr_0.13.0     
##  [25] spatstat.data_3.0-1   survival_3.5-3        zoo_1.8-11           
##  [28] glue_1.6.2            RcppCCTZ_0.2.12       polyclip_1.10-4      
##  [31] gtable_0.3.2          leiden_0.4.3          car_3.1-2            
##  [34] future.apply_1.10.0   spdl_0.0.4            abind_1.4-5          
##  [37] scales_1.2.1          rstatix_0.7.2         spatstat.random_3.1-4
##  [40] miniUI_0.1.1.1        Rcpp_1.0.10           viridisLite_0.4.1    
##  [43] xtable_1.8-4          spatstat.core_2.4-4   bit_4.0.5            
##  [46] htmlwidgets_1.6.2     httr_1.4.5            ellipsis_0.3.2       
##  [49] ica_1.0-3             farver_2.1.1          pkgconfig_2.0.3      
##  [52] sass_0.4.5            uwot_0.1.14           deldir_1.0-6         
##  [55] here_1.0.1            utf8_1.2.3            labeling_0.4.2       
##  [58] tidyselect_1.2.0      rlang_1.1.0           later_1.3.0          
##  [61] munsell_0.5.0         tools_4.2.3           cachem_1.0.7         
##  [64] cli_3.6.0             generics_0.1.3        broom_1.0.4          
##  [67] ggridges_0.5.4        evaluate_0.20         stringr_1.5.0        
##  [70] fastmap_1.1.1         yaml_2.3.7            goftest_1.2-3        
##  [73] knitr_1.42            bit64_4.0.5           fs_1.6.1             
##  [76] fitdistrplus_1.1-8    purrr_1.0.1           RANN_2.6.1           
##  [79] pbapply_1.7-0         future_1.32.0         nlme_3.1-162         
##  [82] mime_0.12             compiler_4.2.3        rstudioapi_0.14      
##  [85] plotly_4.10.1         png_0.1-8             ggsignif_0.6.4       
##  [88] spatstat.utils_3.0-2  tibble_3.2.0          bslib_0.4.2          
##  [91] stringi_1.7.12        nanotime_0.3.7        highr_0.10           
##  [94] rgeos_0.6-2           lattice_0.20-45       Matrix_1.5-3         
##  [97] vctrs_0.6.0           pillar_1.9.0          lifecycle_1.0.3      
## [100] BiocManager_1.30.20   triebeard_0.4.1       spatstat.geom_3.1-0  
## [103] lmtest_0.9-40         jquerylib_0.1.4       RcppAnnoy_0.0.20     
## [106] data.table_1.14.8     cowplot_1.1.1         irlba_2.3.5.1        
## [109] httpuv_1.6.9          R6_2.5.1              bookdown_0.34        
## [112] promises_1.2.0.1      KernSmooth_2.23-20    parallelly_1.34.0    
## [115] codetools_0.2-19      MASS_7.3-58.2         rprojroot_2.0.3      
## [118] rjson_0.2.21          withr_2.5.0           sctransform_0.3.5    
## [121] mgcv_1.8-42           parallel_4.2.3        grid_4.2.3           
## [124] rpart_4.1.19          tidyr_1.3.0           rmarkdown_2.20       
## [127] carData_3.0-5         Rtsne_0.16            shiny_1.7.4